AITopics | audio chunk

Collaborating Authors

audio chunk

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

i-LAVA: Insights on Low Latency Voice-2-Voice Architecture for Agents

Purwar, Anupam, Choudhary, Aditya

arXiv.org Artificial IntelligenceSep-30-2025

We experiment with a low-latency, end-to-end voice-to-voice communication model to optimize it for real-time conversational applications. By analyzing components essential to voice to voice (V-2-V) system viz. automatic speech recognition (ASR), text-to-speech (TTS), and dialog management, our work analyzes how to reduce processing time while maintaining high-quality interactions to identify the levers for optimizing V-2-V system. Our work identifies that TTS component which generates life-like voice, full of emotions including natural pauses and exclamations has highest impact on Real time factor (RTF). The experimented V-2-V architecture utilizes CSM1b has the capability to understand tone as well as context of conversation by ingesting both audio and text of prior exchanges to generate contextually accurate speech. We explored optimization of Residual Vector Quantization (RVQ) iterations by the TTS decoder which come at a cost of decrease in the quality of voice generated. Our experimental evaluations also demonstrate that for V-2-V implementations based on CSM most important optimizations can be brought by reducing the number of RVQ Iterations along with the codebooks used in Mimi.

large language model, machine learning, rvq iteration, (15 more...)

arXiv.org Artificial Intelligence

2509.20971

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.87)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Automated Detection of Sport Highlights from Audio and Video Sources

Della Santa, Francesco, Lalli, Morgana

arXiv.org Artificial IntelligenceJan-31-2025

This study presents a novel Deep Learning-based and lightweight approach for the automated detection of sports highlights (HLs) from audio and video sources. HL detection is a key task in sports video analysis, traditionally requiring significant human effort. Our solution leverages Deep Learning (DL) models trained on relatively small datasets of audio Mel-spectrograms and grayscale video frames, achieving promising accuracy rates of 89% and 83% for audio and video detection, respectively. The use of small datasets, combined with simple architectures, demonstrates the practicality of our method for fast and cost-effective deployment. Furthermore, an ensemble model combining both modalities shows improved robustness against false positives and false negatives. The proposed methodology offers a scalable solution for automated HL detection across various types of sports video content, reducing the need for manual intervention. Future work will focus on enhancing model architectures and extending this approach to broader scene-detection tasks in media analysis.

artificial intelligence, detection, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2501.161

Country:

Europe > Italy > Piedmont > Turin Province > Turin (0.04)
Europe > Italy > Lazio > Rome (0.04)

Genre: Research Report (0.82)

Industry:

Media (1.00)
Leisure & Entertainment > Sports (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Whispy: Adapting STT Whisper Models to Real-Time Environments

Bevilacqua, Antonio, Saviano, Paolo, Amirante, Alessandro, Romano, Simon Pietro

arXiv.org Artificial IntelligenceMay-6-2024

Large general-purpose transformer models have recently become the mainstay in the realm of speech analysis. In particular, Whisper achieves state-of-the-art results in relevant tasks such as speech recognition, translation, language identification, and voice activity detection. However, Whisper models are not designed to be used in real-time conditions, and this limitation makes them unsuitable for a vast plethora of practical applications. In this paper, we introduce Whispy, a system intended to bring live capabilities to the Whisper pretrained models. As a result of a number of architectural optimisations, Whispy is able to consume live audio streams and generate high level, coherent voice transcriptions, while still maintaining a low computational cost. We evaluate the performance of our system on a large repository of publicly available speech datasets, investigating how the transcription mechanism introduced by Whispy impacts on the Whisper output. Experimental results show how Whispy excels in robustness, promptness, and accuracy.

dataset, transcription, whispy, (16 more...)

arXiv.org Artificial Intelligence

2405.03484

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Italy (0.04)
Asia > Indonesia > Bali (0.04)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry:

Information Technology (1.00)
Leisure & Entertainment (0.69)
Media > Radio (0.55)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Information Management (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Emotion Detection And Analysis

#artificialintelligenceDec-4-2022, 18:45:10 GMT

Emotion Detection and Analysis is a web application developed by the team The Mystic Forces as their final project of the AI5: productionizing AI course at Univ.ai under the guidance of Pavlos Protopapas (Scientific Program Director at the Institute for Applied Computational Science (IACS) at Harvard University) & Shivas Jayaram (Research @Harvard IACS Deep Learning Researcher, Educator, and Practitioner). The web application is an end-to-end implemented deep learning project. Public Speaking is not just a skill but an art which is not easily mastered. It has become an essential for every individual. In this digital world, where your office is your computer screen and online meeting platforms are the places to connect, one is asked to deliver presentations, briefings, and do meetings regularly.

dataset, emotion detection, frontend, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Vehicle Sound Classification Using Deep Learning - Analytics Vidhya

#artificialintelligenceFeb-7-2022, 10:55:06 GMT

One of the most critical parameters of the audio signal is amplitude. Amplitude can be defined as the maximum displacement to amend the rest position, and Sometimes the rest position is also known as the central position, as you can see here in this diagram.

audio chunk, audio data, spectrogram, (13 more...)

#artificialintelligence

Country: Asia > India > Jharkhand > Dhanbad (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Add feedback

The AI that brought the Beatles and Cole Porter back to life

#artificialintelligenceOct-11-2016, 03:25:09 GMT

It may sound like a lost track from The Beatles, but the catchy pop song, 'Daddy's Car', was composed by artificial intelligence (AI). The tune was created by Flow Machines, a system Sony taught to make music by feeding it 13,000 samples from different genres. Although the software is capable of creating the lead sheet, a human composer instructed it to produce a record in the style of The Beatles and wrote the lyrics. It may sound like a lost track from The Beatles, but the catchy pop song, 'Daddy's Car', was composed by artificial intelligence (AI). Sony has taught its AI, Flow Machines, how to compose music.

artificial intelligence, flow machine, machine learning, (15 more...)

#artificialintelligence

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

The AI that brought The Beatles and Cole Porter back to life: Listen to Sony software that can create new songs in the style of any artist

Daily Mail - Science & techOct-11-2016, 01:40:25 GMT

artificial intelligence, flow machine, machine learning, (14 more...)

Daily Mail - Science & tech

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback